473,418 Members | 2,038 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,418 software developers and data experts.

Validating utf-8 character strings in javascript regular expression

los
Hi,

I've created a web application using struts. I am trying to solve an
issue where in one of the forms where I want to allow the values
inserted to be special characters from other languages, but not symbols
such as (, <, +, }, etc... Creating the regular expression that
handles these values is becoming quite hard to find. Right now I have
^([a-zA-Z0-9_\x81-\xFF])*$ and this works for some utf-8 characters
such as ã, é, ó, etc... But doesn't work for other characters such
as Æ, Ü, ß, etc...

I was wondering if someone has come across this issue and has found a
solution for the problem.

Thanks,

-Los

Sep 20 '05 #1
6 27811
ASM
los wrote:
Hi,

I've created a web application using struts. I am trying to solve an
issue where in one of the forms where I want to allow the values
inserted to be special characters from other languages, but not symbols
such as (, <, +, }, etc... Creating the regular expression that
handles these values is becoming quite hard to find. Right now I have
^([a-zA-Z0-9_\x81-\xFF])*$ and this works for some utf-8 characters
such as ã, é, ó, etc... But doesn't work for other characters such
as Æ, Ü, ß, etc...

I was wondering if someone has come across this issue and has found a
solution for the problem.


if you are in encodage IS0-8859-1 :

? ^([a-zA-Z0-9_\xA0-\xFF])*$
? ^([a-zA-Z0-9_\x A0-\x FF])*$
? ^([a-zA-Z0-9_/A0/-/FF/])*$
? ^([\x61-\x7A\x41-\x5A\x30-\x39\x5F\xA0-\xFF])*$

? ^([/61/-/7A/41/-/5A/30/-/39//5F/A0-/FF/])*$

--
Stephane Moriaux et son [moins] vieux Mac
Sep 20 '05 #2
los
What if we don't want to restrict to just ISO-8859-1 characters? What
if we want to be all of the UTF-8 characters?

I tried doing something like ^([a-zA-Z0-9_\x0080-\xFFFF])*$ and it
didn't work.

-Los

Sep 21 '05 #3
ASM
los wrote:
What if we don't want to restrict to just ISO-8859-1 characters? What
if we want to be all of the UTF-8 characters?
because \x?? is not utf-8
it is hexa
and because the hexa code is not same in each charset

example :
space = \xA0 (hexa) = 00A0 (unicode) = C2 A0 (utf-8)
space = hexa : A0 with chartsets : ISO-8859-1 & CP1252
space = hexa : FF with chartsets : CP850 & CP437

http://www.miakinen.net/vrac/c10/charsets
I tried doing something like ^([a-zA-Z0-9_\x0080-\xFFFF])*$ and it
didn't work.


? ^([a-zA-Z0-9_/0080/-/FFFF/])

think it could be :

0081 to 00FF unicode
or
C2A0 to C3BF utf-8
from :
http://www.macchiato.com/unicode/chart/
or :
other url above

--
Stephane Moriaux et son [moins] vieux Mac
Sep 21 '05 #4
los
Thanks for the reply!

I tried your approach but for some reason the javascript parser doesn't
recognize the utf-8 characters still.

Could someone please verify that the correct regex should be
^([a-zA-Z0-9_\u00A1-\uFFFF])*$ ?

If I use the above regex in my xml, in the javascript that gets
generated on the web page I get the following rule;

this.mask=/^([a-zA-Z0-9_\\u00A1-\\uFFFF])*$/;

I apologize if this is a frugal question but I'm new at this and am
learning this as I go along.

Thanks,

-Los

Sep 21 '05 #5
los wrote:
I tried your approach but for some reason the javascript parser doesn't
recognize the utf-8 characters still.

Could someone please verify that the correct regex should be
^([a-zA-Z0-9_\u00A1-\uFFFF])*$ ?
It should not. Firstly, Unicode escapes needs to be supported which is not
the case with every script engine. Test it like

/\u00A1/.toString().length < 4 ? supported : unsupported

Secondly, using the Asterisk (`*') quantifier includes that it also matches
for the empty string; you should use the Plus (`+') quantifier instead.

Thirdly, you have to specify what Unicode glyphs you consider to be
"symbols". For example, including Unicode glyphs 0x00A1 to 0xFFFF as above
would also include glyph range 0x2100 to 0x214F (Letterlike Symbols).
See <http://unicode.org/> and <http://pointedears.de/scripts/test/charset>
for details.
If I use the above regex in my xml, in the javascript that gets
generated on the web page I get the following rule;

this.mask=/^([a-zA-Z0-9_\\u00A1-\\uFFFF])*$/;


The fact aside that this would include the empty string as well, that
would be quite obviously a RegExp completely different to the one above.
Escaping the backslash would include it as literal character into the
character class including all following elements of the previous escape
sequence (here: u, 0, A, 1, F).

What you possibly could want is

this.mask = new RegExp("^([a-zA-Z0-9_\\u00A1-\\uFFFF])+$");

where the escaped backslashes would collapse to single ones before
passed to the RegExp constructor and so apply to the first RegExp
literal (apart from the quantifier). But you should rather configure
your server-side code generator not to escape escape sequences.
PointedEars
Oct 16 '05 #6
JRS: In article <11****************@PointedEars.de>, dated Sun, 16 Oct
2005 18:41:31, seen in news:comp.lang.javascript, Thomas 'PointedEars'
Lahn <Po*********@web.de> posted :
los wrote:


ON 21 SEPTEMBER
AISB, your attribution line does not comply with the minimum current
Usenet thinking - this is not news:de,* here, as you should know.
I tried your approach but for some reason the javascript parser doesn't
recognize the utf-8 characters still.

Could someone please verify that the correct regex should be
^([a-zA-Z0-9_\u00A1-\uFFFF])*$ ?


It should not. Firstly, Unicode escapes needs to be supported which is not
the case with every script engine. Test it like

One had hoped that the turd who thinks it useful to disinter aged
threads had himself passed on to another place.

--
© John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 MIME. ©
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)
Oct 16 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: scorpion | last post by:
I have this problem that an xml instance is validated correctly by xml tools, but not with my simple code, by setting the validating flag to true. --------------- Schema...
3
by: Paul Wake | last post by:
http://www.xmission.com/~wake/section27.html now works for me in IE/Win but not in Mozilla/Win (my PowerBook is dead and I'm now using my mother-in-law's PC, which limits my options for checking...
2
by: Dan Shookowsky | last post by:
I'm trying to write a schema that allows me to substitute entensions for a base type. The schema (included below) defines a StepType and an AnnouncementStepType that is an extension of the base...
1
by: Christian | last post by:
Hi, I load an Xml-file "customers.xml" into a DataSet (works fine) but then how do I validate it against a schema (e.g. customers.xsd) ? my customers.xml: <?xml version="1.0"...
1
by: Craig Beuker | last post by:
Hello, I am experimenting with this XmlValidatingReader and have a question about how it is working (or not working as would be the case) The sample documents and code are included at the end...
4
by: bibsoconner | last post by:
Hi, I hope someone can please help me. I'm having a lot of trouble with schema files in .NET. I have produced a very simple example that uses "include" to include other schema files. It all...
2
by: Cesar | last post by:
Hello, I've developed a .NET C# web service; which has one method named, let's say, upload_your_data. This method has one parameter ( string your_data). The value that this parameter will...
2
by: PapaRandy | last post by:
Hello, I am trying to validate the following .py webpage as HTML (through W3C). I put: ----------------------------------------------------------------------------- print "Content-type:...
21
by: Jack White | last post by:
Hi there, I've created a strongly-typed "DataSet" using VS. If I save the data via "DataSet.WriteXml()" and later prompt my users for the name of the file in order to read it back in again...
3
by: jh3an | last post by:
Please give me your advice! I made two files according to xml book, but when validating these two files, it gives me an error that I totally don't understand. Is there a problem in these...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.